rank | frequency | n-gram |
---|---|---|
1 | 9448 | -н |
2 | 7765 | -е |
3 | 6549 | -м |
4 | 4788 | -т |
5 | 3539 | -ш |
rank | frequency | n-gram |
---|---|---|
1 | 5100 | -ым |
2 | 3725 | -ын |
3 | 3500 | -ан |
4 | 2211 | -ак |
5 | 2205 | -ат |
rank | frequency | n-gram |
---|---|---|
1 | 2314 | -лан |
2 | 1777 | -ште |
3 | 1400 | -лак |
4 | 984 | -кым |
5 | 923 | -ыже |
rank | frequency | n-gram |
---|---|---|
1 | 1374 | -влак |
2 | 1161 | -ыште |
3 | 647 | -акым |
4 | 588 | -ский |
5 | 520 | -аште |
rank | frequency | n-gram |
---|---|---|
1 | 1337 | --влак |
2 | 610 | -лакым |
3 | 460 | -лакын |
4 | 312 | -штыже |
5 | 269 | -лаште |
The tables show the most frequent letter-N-grams at the ending of words for N=1…5. Everything runs in parallel to 2.2.5 Most frequent word beginnings. The aim is suffix detection instead of affix detection.
For N=3:
SELECT @pos:=(@pos+1), xx.* from (SELECT @pos:=0) r, (select count(*) as cnt ,concat("-", right(word,3)) FROM words WHERE w_id>100 group by right(word,3) order by cnt desc) xx limit 5;
2.2.5 Most frequent word beginnings